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Abstract — We introduce a low complexity approach to iterative 
equalization and decoding, or "turbo equalization", that uses 
clustered models to better match the nonlinear relationship that 
exists between likelihood information from a channel decoder and 
the symbol estimates that arise in soft-input channel equalization. 
The introduced clustered turbo equalizer uses piecewise linear 
models to capture the nonlinear dependency of the linear mini- 
mum mean square error (MMSE) symbol estimate on the symbol 
likelihoods produced by the channel decoder and maintains a 
computational complexity that is only linear in the channel 
memory. By partitioning the space of likelihood information 
from the decoder, based on either hard or soft clustering, and 
using locally-linear adaptive equalizers within each clustered 
region, the performance gap between the linear MMSE equalizer 
and low-complexity, LMS-based linear turbo equalizers can be 
dramatically narrowed. 

Index Terms — Turbo equalization, piecewise linear modelling, 
hard clustering, soft clustering. 

I. Introduction 

Digital communication receivers typically employ a symbol 
detector to estimate the transmitted channel symbols and a 
channel decoder to decode the error correcting code that was 
used to protect the information bits before transmission. There 
has been great interest in enabling interaction between the 
symbol estimation task and the channel decoding task, which 
is often termed "turbo equalization" for digital communication 
over channels with inter- symbol-interference (ISI). This inter- 
est is due to the dramatic performance gains that can be ob- 
tained with modest complexity [1] over performing these tasks 
separately. Turbo equalization methods employing maximum- 
a-posteriori probability (MAP) detectors demonstrate excellent 
bit-error-rate (BER) performance, however their computational 
complexity often renders their application impractical [1]. As 
an alternative, linear MMSE-based methods offer comparable 
performance to MAP-based approaches, with dramatically 
reduced complexity [1], compared with the exponential com- 
plexity of the MAP-based approach. However, MMSE-based 
approaches still require quadratic computational complexity 
in the channel length per output symbol and require adequate 
channel knowledge or estimation. To further reduce computa- 
tional complexity and improve efficacy over unknown or time- 
varying channels, "direct" LMS-adaptive linear equalizers are 
often used, employing only linear complexity [2] in the 
regressor vector length, which is often on the order of the 
channel delay spread. 

While these direct- adaptive methods may reduce com- 
putational complexity and can be shown to converge to 
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their Wiener (MMSE) solution under stationary environments, 
they usually deliver inferior performance compared to linear 
MMSE-based methods. A primary reason for this performance 
loss is that the Wiener solution is not time- adaptive, but rather 
corresponds to the solution of the "stationarized problem" 
where the likelihood information from the decoder (which is 
by definition a sample-by- sample probability distribution over 
the transmitted data sequence and hence non- stationary) is 
replaced by a suitable time-averaged quantity [2] . On the other 
hand, both the linear MMSE and MAP-based turbo equalizer 
(TEQ) consider the log-likelihood ratio (LLR) sequence as 
time- varying a priori statistics over the transmitted symbols. 
This LLR information is used to construct the linear MMSE 
equalizer, which depends nonlinearly and in a time dependent 
manner on the LLR sequence. 

In order to reduce the performance gap between LMS- 
adaptive linear TEQ and linear MMSE TEQ, we introduce an 
adaptive approach that can readily follow the time variation 
of the soft decision data and respect the nonlinear dependence 
of the MMSE symbol estimates on this LLR sequence while 
maintaining the low computational complexity of the LMS- 
adaptive approach. Specifically, we introduce an adaptive, 
piecewise linear equalizer that partitions the space of LLR 
vectors from the channel decoder into sets, within which, 
low complexity LMS-adaptive TEQs can be used. We use a 
deterministic annealing (DA) algorithm [3] for soft clustering 
the symbol-by- symbol variances of the transmitted symbols, 
calculated from the soft information. These variances are par- 
titioned into K regions with a partial membership according to 
their assigned association probabilities [3]. For hard clustering, 
the association probabilities are either 1 or 0. In each cluster, 
a local linear filter is updated where the contribution to the 
local update is weighted by the association probabilities [3]. 
In addition, we also quantify the mean square error (MSE) 
of the approach employing hard clustering and show that 
it converges to the MSE of the linear MMSE equalizer as 
the number of regions and the data length increase. In our 
simulations, we observe that the clustered TEQ significantly 
improves performance over traditional LMS-adaptive linear 
equalizers without any significant computational complexity 
increase. 

In Section [Til we provide a system description for the 
communication link under study. The clustering approach 
and the corresponding clustered equalization algorithms are 
introduced in Section Jill The performance of these algorithms 
is demonstrated in Section [IV] We conclude the letter with 
certain remarks in Section [V] 
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Fig. 1. System block diagram for a bit interleaved coded modulation 
transmitter and receiver with a linear TEQ 

II. System Description Under Study 
We consider the linear turbo equalization system shown 
in Fig. [lQ Information bits at the transmitter are encoded 
using forward error correction, interleaved in time, mapped 
to channel symbols and transmitted through an ISI channel 
with impulse response hi, of length L, I = 0, ...,L — 1 
and additive noise w[n). The received signal y[n] is given 
by y[n) = X^^o hix[n — I] + w[n] 9 where hi is assumed 
time invariant for notational ease. In Fig. [T] the decoder 
and equalizer pass extrinsic log-likelihood ratio information 
on the information bits to iteratively improve detection and 
decoding. The equalizer produces a priori information L^ and 
the decoder computes the extrinsic information L® which are 
fed back to the equalizer [1]. For a linear equalizer with a 
feedforward filter / and feedback filter 6, an estimate of the 
transmitted signal can be given by 

1, (1) 
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where y[n] = [y[n — N 2 ], • • • , y[n - 

N 2 — L+l], ••• , x[n— l],x[n+l], • • • ,x[n+Ni]] T . The mean 
symbol values are calculated using the a priori information L^ 
provided by the SISO decoder, i.e., x[n] = E[x[n] : {L^}] 
and E[\x[n}\ 2 : {Lf }] = 1 [1], where we assumed BPSK 
signaling for notational simplicity. If a linear MMSE equalizer 
is used in (OD), we get 

f[n] = (H-oV[n}H* + ss H tally's, b[n] = H H f[n], (2) 

where H is the channel convolution matrix of size N x ( N + 
L — 1), s is the (N 2 + L)th column of H, H-q is the matrix 
where the (N 2 + L)th column of H is eliminated, V[n] = 
dmg([v[n-N 2 -L+l] r • - ,v[rc-l],v[rc+l],- ■ ■ ,v[n+iVi]]), 
v[n] = E[\x[n]\ 2 : L^] — \x[n]\ 2 and a J is the additive noise 
variance assuming fixed transmit signal power of 1. 

Remark 1: The linear MMSE equalizer in ® is time vary- 
ing due to the symbol-by- symbol variation of the soft input 
variance, V [n] , even if hi is time invariant. The linear MMSE 
equalizer is a nonlinear function of V[n]. If hi is also time 
varying, then © could be readily updated by including this 
time variation. 

Unlike the linear MMSE equalizer, "direct" adaptive linear 
TEQs use adaptive updates (e.g. using LMS or RLS), for 
direct estimation of the transmitted symbols by processing 

l A\\ vectors are column vectors denoted by lowercase letters and matrices 
are represented by boldface capital letters. w H is the Hermitian transpose 
and 1 1 w 1 1 denotes the h norm of w. diag(iu) represents the diagonal matrix 
formed by the elements of w along the diagonal. For a (random) variable x, 
x = E[x]. Given x with a distribution defined from y, E[x : y] represents 
the expectation of x with respect to the distribution defined from y. For a 
square matrix S, A m ax(£) denotes the largest eigenvalue. 



the received signal and LLR information without the need for 
channel estimation [2]. In general, these approaches use only 
the mean vector X- n as feedback, i.e., soft decision data are 
not considered as a priori probabilities, where each component 
of x is taken as a random variable with zero mean and variance 
cr 2 . As an example, if one uses the NLMS direct adaptive 
linear equalizer, we have the update 



w H [n]u[n], 



w[n + 1] = w[n] + /ie*[n]n[n]/||n[n] 



where w[n + 1] = [f H [n + 1] 



b H \n 



l]] H , u[n] 



[y H [n] x_ n ] H , /i is the step size and x[n] is equal to the mean 
x[n]. Under this stationarity assumption on x and LLRs, the 
feedforward filter using X- n converges to the MSE optimal 
Wiener (stationary MMSE) solution 

» (3) 



/ = ((1 - ai)H_ H» + 88" + oil)" 1 * 

and b = H^ f, assuming zero variance at convergence [1]. 
The resulting filter in d3} at convergence is time invariant and 
is identical to © with time averaged soft information [1]. 
The linear MMSE in requires 0((N + L) 2 ) computations 
per output, however, d3} requires only 0(N + L). Since 
© is not time varying and implicitly assumes that the soft 
information is stationary, there is a large performance gap 
between linear MMSE in © and ® [1]. We seek to reduce 
this performance gap between the direct adaptive methods 
with respect to the linear MMSE approach, by capturing the 
nonlinear dependence of the MMSE solution on the soft- 
information, without capturing the associated computational 
complexity of ©. 

III. Adaptive Turbo Equalization Using Hard or 
Soft Clustered Linear Models 

We propose to use adaptive local linear filters to model the 
nonlinear dependence of the linear MMSE equalizer on the 
variance computed from the soft information generated by the 
SISO decoder in ©. We do this by partitioning the space of 
variances in (|2]) into a set of regions within each of which 
a single direct adaptive linear filter is used. As a result, we 
can retain the computational efficiency of the direct adaptive 
methods, while capturing the nonlinear dependence (and hence 
sample-by- sample variation) of the MMSE optimal TEQ. 

A. Adaptive Nonlinear Turbo Equalization Based on Hard 
Clustering 

Suppose a hard clustering algorithm is applied to {v[n]} n >i 
after the first turbo iteration to yield K regions TZk, with 
the corresponding centroids Vk, k = 1,...,K. Here, v[n] 
is the vector formed by the diagonal entries of V[n]. 
As an example, one might use the if -means algorithm 
(LBG VQ) [3]. In the LBG VQ algorithm, the centroids 
and the corresponding regions are determined as Vk = 

En^We^^N/^H^ 1 )' and n k = {v : \\v - 
Vk\\ < \\v — Vi\\,i = 1, • • • ,K,i 7^ k} 9 where the regions 1Zk 
are selected using a greedy algorithm [3]. After the regions 
are constructed using the VQ algorithm, the corresponding 
filters in each region are trained with an appropriate direct 
adaptive method, and the estimate of x[n] at each time n is 
computed as x[n] = Xi[n] if i = argmin/c \\v[n] — Vk\\- For 



TABLE I 
Pseudocode for adaptive TEQ via hard clustering 

SetAT min .K 1 = L^D/^minJ, (line A) 

i = 1, % First turbo iteration 

for k = 1 : K + 1; xu /-j\ = 0, endfor 

for n = 1 : L T ; 

e[n] = x[n] - w f k + 1) (i) [n]u[n], 

^(fc + l^ 1 ) [ n + !] = ™(fc + i)(l) N +Me*[n]u[n], endfor 
for n = Lrp -j- 1 : L^p -\- L ]j\ 



e[n] : 



„H 



(line B) 
(line C) 



(fc + l)( 1 ) L J L J 

for i = 2, . . . , % turbo iterations, 

Perform hard clustering, based on modified LBG algorithm. 

Outputs: Ki = K, V fc(i) = V k , 

fork^ 1 ) — 1 : Ki , % Filter initialization 

if t == 2; / fc(i ) [1] = /k 1 + i[L t + I, D ], 
elsefc* = argmin||V r fc(i) - V" fc (i-i) || 2 , 

fe( i - 1 ) = 1, . . . , «:<_!, / (<) [1] = / fc * [Z, T + Ld], 

b fc (i) [1] = b fc* [^T + ^d], endfor 
for n = 1 : Lqp,% Training period. 

w k [n + 1] = ™ fc [n] + /x fc e£[n](I - V fc ( i) ) 1/2 7z[n], endfor 
forn = I/7- + l:I/7- + I/^; 

fc* = argmin ||V m - V[n]|| 2 (line D) 

■u; fe * [n + 1] = xo fc * [n] + /x fc [n]ej [n]ix[n]/ ||t*[n] || 2 , (line E) 
/x fe [n] = j M for fc = fc* ,x[n] = w^ [n]u[n] endfor (line F) 
Go to the Clustering step: Until desired turbo iterations or error rate 

the adaptive algorithms to converge in each of these regions, 
we put a constraint on the cluster- size such that each cluster 
contains at least iV m i n (the minimum required data length for 
suitable convergence) elements and the quantization level is 
equal to or less than that of the original LBG VQ. At each 
time n, the received data is assigned to one of the regions and 
used in an adaptive algorithm to train a locally linear direct 
adaptive equalizer. For a locally NLMS direct adaptive linear 
equalizer, we have the update 

-wg[n]u[n], (4) 

w k [n] + fiel[n}u[n}/(\\u[n}\\ 2 ), v[n] e K k , 
Wi[n],i = l,...,K,i^k, (5) 



e k [n\ = x[i 
w k [n + l] 

Wi[n + 1] - 



x\n\ 



: x k [n\ 



where w k [n + 1] = [/f [n + 1] - 6f [n + 1]]^, u[n] = 
[y H [n] x^ n ] H , and x[n] in d4j) is equal to either the hard 
quantized x[n] or the mean x[n] in decision directed (DD) 
mode. An algorithm description is given in Table U Here, 
Lt and Lb are the length of training data and transmit data. 
During training period, perfect knowledge for the transmitted 
data x[n] is available, so the K adaptive filters can use 
weighted training symbols as input to the feedback filters 
in order to enable the filters to converge to a function of 
the quantized soft input variance. The weight matrices are 
selected as (I — V k (i)) at the zth turbo iteration. Note that 
the complexity of the locally linear adaptive filters are higher 
than direct equalization due to the clustering step. Since the 
clustering is only performed at the start of each iteration with 
0(N + L — 1) complexity per data symbol, the equalization 
complexity is effectively unchanged per output symbol. If the 
regions are dense enough such that v[n] w v k for all regions, 
then the adaptive filter in the kth region converges to f k = 
(H _ V ' k H% + ss H + cr^I)- 1 s, V k = diag(t)fc), assuming 
zero variance at convergence. The difference between the MSE 
of the converged filter f k and the MSE of the linear MMSE 
equalizer is given as [1] 

yf H _ (V [n] - V k )H» f k + (1 - /£«) - (1 - f*a). (6) 



By defining A = (H^VH^ + ss H + all), B = A + 
H-qEH" and E = V - V, the difference © yields 

s H A- 1 H- EH%A- 1 s + s H {B~ 1 - A~ l )s 

= s H A- l H- EH H _ B- l H- EH H _ A- l s (7) 

< \ max {H- a EH^ B- i H- EH^ )^ I A- 2 a (8) 

< e 2 max \ 2 max (H. H%)X min (B)s H A~ 2 s, 

where e max is the maximum element of the error diago- 
nal matrix E. Here, © follows from (B' 1 - C" 1 ) = 
B\C - B)C\ ® follows from ti (CD) = ti(DC) 
and tr (CD) < A max (C)tr( J D), and the last line follows 
from X max (CD) < A max (C)A max (D). Since A min (B) > 
crl and \ max (H_ H H ) < X m ^(HH H ) < (J2 m \h m \) 
for the Toeplitz matrix H, the MSE difference in © is 
bounded by Ce^ ax for some C < oo. Hence, the MSE of the 
hard clustered linear equalizer converges to the MSE of the 
linear MMSE equalizer as the number of the regions increase 
provided there is enough data for training. 

B. Adaptive Nonlinear Turbo Equalization Based on Soft 
Clustering 

Suppose the deterministic annealing (DA) algorithm de- 
scribed in Table [III is used for soft clustering [3] on 
{i;[n]} n >i after the first turbo iteration, to give K clusters 
with the corresponding centroids v k and association proba- 
bilities P(v[n]\v k ), k = 1, . . . , K. Then, at each time n, the 
vector v[n] can be partially assigned to all K regions using 
conditional probabilities yielding the update 

e k [n] = x[n] — w k [n]u[n], 

w k [n+l] = w k [n] + ii k [n)el[n)u[n)/(\\u[n)\\ 2 ), (9) 

ji k [n) = fiP(v[n]\v k ) , (10) 

where w k [n + 1] = [f k [n + 1] - b k [n + 1}} H , u[n] = 
[y H [n] x^ n ] H and n k [n] is the fractional step size. To gen- 
erate the final output, outputs of K linear filters can be com- 
bined by either using another adaptive algorithm [4] or other 
combination methods [3]. We use the method in [4] as follows. 
At each time n, we construct y[n] = [x i[n], . . . , xk[ti]] t and 
produce the final output and update the weight vectors as 

• TLL ' L1 (ID 

(12) 
(13) 



x\n\ 



w [n\y[n\, 

..Tr 



e[n\ = x[n\ — w [n\y[n\^ 

w[n + l] =w[n] +/i[n]e*[n]y[n]/||y[n]|| 2 , 



and jjl is a learning rate for this combining step. An update as in 
(fT3l) can provide improved steady- state MSE and convergence 
speed exceeding that of any of the constituent filters, i.e., 
x k [n], k = 1, . . . , K, under certain conditions [4]. 

The algorithm description is the same as in Table ID except 
that line A is removed and K\ is set to X max , and soft 
clustering [3] is used in line B. In line C, we add an Lb x Ki 
probability matrix corresponding to P (v [n] \v k ) to the outputs. 
Line D and E are removed, © and (fTOt for all k are used 
instead. Line F is also removed and replaced by (HIT) . (IT2l» 
and (fT3l) , respectively. 

IV. Simulation Results 

Throughout the simulations, a time invariant ISI channel 
given by hi = [0.227, 0.46, 0.688, 0.46, 0.227] is used. We use 
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Fig. 4. BER comparison in DD mode, where soft 
decision value from the decoder is used and TBiter 
is the turbo iteration count. 



TABLE II 
Soft Clustering based on Deterministic Annealing 

% Set the maximum number of code vectors, the maximum number of iterations 

% and a minimum temperature, i.e., -?C max , I ma x and T min . 

K = 1, vi = -^ J2 n v i n ] and p (^l) = !• 

T — Tq % An initial temperature, Tq, should be larger than A max (cov (v, v)). 

forn = 1 : N; P(v k \v[n]) = -fa endfor 

it'^>^min; 

T = aT for a < 1% Cooling Step 
ifK < K max;j =0. 
for k = 1 : K; 

if T > Tc k ; % Split the fcth cluster with slight perturbation 
elseif j = j + 1 endfor 
if j == K; finish DA. 
elseif; finish DA 
elseif; finish DA 

-T=l 

while converged or i < I ma x; 
for fc = 1 : K; 
forn = 1 : AT; 



TABLE III 
SNR THRESHOLDS in dB OF several algorithms 



mode 1 1 decision directed | training 


original NLMS TEQ 


10.9 


6.0 


NLMS TEQ w/ hard clustering 


6.5 


5.5 


NLMS 1LQ w/ soft clustering (combine) 


5.3 


5.0 


NLMS TEQ w/ soft clustering (selection) 


5.9 


4.8 



P(v k \v[n}) 
P&k) = En P(v[n])P(v k \v[n]) t v h = 
endfor % calculate distortion and check convergence 
endwhile % Go to Cooling Step 



P(fi fc)ea p(- "*^-*fc" )/ Efc P(v fe )e* P (-^ 

52 n v[n]P(v k \v[n])P(v[n]) 



P ^k) 



rate 1/2 convolutional code with constraint length 3, random 
interleaving and BPSK signaling. We choose Lt = 1024, 
L D = 4096, N min = 500 and if max = 8. Each NLMS 
filter has a length 15 feedforward and length 19 feedback 
filter (JVi = 9, N 2 = 5) where \i = 0.03. For an NLMS 
filter with soft clustered TEQ, the filter length is less than 
K max and /i = 0.1. Fig. [2] and Fig. [5] show EXIT charts 
for a conventional NLMS TEQ [2] (LMSTEQ), the switched 
NLMS TEQ based on hard clustering with restriction on the 
number of data samples in each cluster (QLMSTEQ) and an 
NLMS TEQ based on soft clustering (SQLMSTEQ). In Fig. 
[3 hard decision data are used to learn the NLMS filter, while 
in Fig. [3] the transmitted signals are used during data the data 
transmission period. In Fig. [4] we provide the corresponding 
BERs. For the soft clustering based NLMS TEQ, the final 
output is given by either adaptively combining to minimize 
combined MSE with another NLMS filtering as given in 
Section IIII-BI or selecting one of the outputs to minimize 
instantaneous residual error after filtering. 

In all simulations, adaptive TEQs based on soft clustering 
showed significantly better performance to hard clustered 
adaptive TEQ and direct adaptive TEQ. In Fig. [2 (i.e., in the 
DD mode with hard decision data), the adaptive combination 
of adaptive filters showed better performance than selecting 
a single filter, since the combination method can mitigate the 
worst-case selection [4]. However, in a dynamically changing 
feature domain, combining the outputs of the constituent filters 
in MSE can loose the benefit from the local linear models 



_[4]. As shown in Fig. [51 selecting one filter among K filters 
shows better performance than the combination of the filters. 
As discussed in Section [TIT1 the DD-NLMS TEQ can achieve 
"ideal" performance, i.e. time-average MMSE TEQ, as the 
decision data becomes more reliable. However, there is still a 

-mutual information gap between the exact MMSE TEQ and 
the NLMS adaptive TEQ. As an example, the NLMS TEQ in 
Fig. [2 cannot converge to its ideal performance if the tunnel 
between the transfer function of equalizer and that of the de- 
coder is closed. This point can be identified by measuring the 

-signal to noise ratio (SNR) threshold. If the SNR is higher than 
the SNR threshold, turbo equalization can converge to near 
error- free operation. Otherwise, turbo equalization stalls, and 
fails to improve after a few iterations. The jj^s corresponding 
to the SNR thresholds by equalization algorithm are given in 
Table [nil Adaptive nonlinear TEQs based on soft clustering 
yielded 0.5dBJ^ gain in SNR threshold compared to adaptive 
nonlinear TEQ based on hard clustering and about ldBJ^ 
gain compared to the conventional adaptive linear TEQ. 

V. Conclusion 
We introduced adaptive locally linear filters based on hard 
and soft clustering to model the nonlinear dependency of the 
linear MMSE turbo-equalizer on soft information from the 
decoder. The adaptive equalizers have computational com- 
plexity on the order of an ordinary direct adaptive linear 
equalizer. The local adaptive filters are updated either based 
on their associated region using hard clustering or fractionally 
based on association probabilities in soft clustering. Through 
simulations, the superiority of the proposed algorithms are 
demonstrated. 

References 

[1] M. Tiichler, R. Koetter, and A. Singer, "Turbo equalization: principles 
and new results," IEEE Trans. Commun., vol. 50, no. 5, pp. 754-767, 
May 2002. 

[2] C. Laot, A. Glavieux, and J. Labat, "Turbo equalization: adaptive equal- 
ization and channel decoding jointly optimized," IEEE Jour. Select. Areas 
in Commun., vol. 19, no. 9, pp. 1744-1752, Sep 2001. 

[3] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. 
Kluwer Academic Pub. Co., 1992. 

[4] S. S. Kozat, A. E. Erdogan, A. C. Singer, and A. H. Sayed, "Steady-state 
MSE performance analysis of mixture approaches to adaptive filtering," 
IEEE Trans. Sig. Proc, vol. 58, pp. 4050-4063, Aug. 2010. 



